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Abstract 

Parallel algorithms designed for simulation and performance evalu- 
ation of single-server tandem queueing systems with both infinite and 
finite buffers are presented. The algorithms exploit a simple compu- 
tational procedure based on recursive equations as a representation of 
system dynamics. A brief analysis of the performance of the algorithms 
are given to show that they involve low time and memory requirements. 



1 Introduction 

The simulation of a queueing system is normally an iterative process which 
involves generation of random variables associated with current events in the 
system, and evaluation of the system state variables when new events occur 
[U [21 [3]. In a system being simulated the random variables may represent 
the interarrival and service time of customers, whereas, as state variables, 
the arrival and departure time of customers, and the service initiation and 
completion time can be considered. 

The usual way to represent dynamics of queueing systems as well as 
their performance criteria relies on recursive equations describing evolution 
of system state variables [2j [3], HI 13 [6] . Since the recursive equations actually 
determine a global structure of changes in the system state variables consec- 
utively, they can serve as a basis for the development of efficient simulation 
algorithms [21 EJ [6] . 

In this paper, we assume as in [3l [6] that appropriate realizations of 
the random variables involved in simulation are available when required, 
and we therefore concentrate only on deterministic parallel algorithms of 
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evaluating the system state variables from these realizations. Methods and 
algorithms of generating random variables and their analysis can be found 
in pp. A thorough investigation of parallel simulation from the viewpoint of 
statistics is given in [H]. 

We present parallel algorithms designed for simulation and performance 
evaluation of open single-server tandem queueing systems with both infinite 
and finite buffers. The algorithms are based on a simple computational 
procedure which exploits a particular order of evaluating the system state 
variables from the related recursive equations, and they are intended for 
implementation on either a vector processor or single instruction, multiple 
data (SIMD) parallel processors [9]. The analysis of their performance shows 
that the algorithms involve low time and memory requirements. 

In Section [2 we give recursive equations which describe the dynamics 
of tandem systems with both infinite and finite buffers. Furthermore, tan- 
dem system performance criteria are represented in terms of state variables 
involved in the recursive equations. In Section [3l parallel simulation algo- 
rithms are presented and their performance is discussed. A brief conclusion 
is given in Section 01 

2 Models of Tandem Queues 

In this section we consider recursive equation based models of tandem queues, 
and give related representation of system performance measures. We start 
with a simple model of a single-server tandem queueing system with infinite 
buffers, and then extend it to more complicated models of systems with 
finite buffers, in which servers may be blocked according to some blocking 
rule. 

2.1 Tandem Queues with Infinite Buffers 

Consider a series of N single-server queues with infinite buffers, depicted 
in Fig. [TJ An additional queue labelled with is included in the model to 
represent the external arrival stream of customers. 

1 N 
□O-DO DO— 

Figure 1: Tandem queues with infinite buffers. 

Each customer that arrives into the system is initially placed in the buffer 
at the 1st server and then has to pass through all the queues one after the 
other. Upon the completion of his service at server n, the customer is 
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instantaneously transferred to queue n + 1 , n = 1, . . . , N — 1 , and occupies 
the (n + l)st server provided that it is free. If the customer finds this server 
busy, he is placed in its buffer and has to wait until the service of all his 
predecessors is completed. 

For each queue n in the system, n = 0,1, . . . , N , we introduce the 
following notations: 

A k , the kth arrival epoch to the queue; 

B k , the kth service initiation time at the queue; 

C k , the fcth service completion time at the queue; 

B k , the fcth departure epoch from the queue. 

Furthermore, let us denote the time between the arrivals of kth customer 
and his predecessor to the system by Tq , and the service time of the fcth 
customer at server n by r k , n = 1, . . . , N , k = 1,2, .... We assume 
that r k > are given parameters, whereas A^,B^,C^, and B k present 
unknown state variables. Finally, for each n = 0, . . . , N , we define B k = 
for all k < 0, and B k _ x = for all k = 1,2, 

With the condition that the system starts operating at time zero, and 
its servers are free of customers at the initial time, the state variables in the 
model can be related by the equations [21 [31 [6] 

A k n k 

A n — ^n-li 

B k n = A k \/B k -\ 

(~ik _ r>k _|_ k 
j~\k /—ffc 

where the symbol V stands for the maximum operator, n = 0, 1, . . . , N , 
k = 1,2,... . Clearly, the above set of recursive equations may be reduced 
to two equations 

B k = B k n _tVB k n -\ (1) 

D k n = B k + r k , (2) 

and even to the equation 

B k = {Bi_ 1 VB k - l ) + T k , (3) 

which will provide the basic representations for simulation algorithms in the 
next sections. 

2.2 Tandem Queues with Finite Buffers 

Suppose now that the buffers of servers in the open tandem system have 
finite capacity. Furthermore, we assume that the servers may be blocked 
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according to some blocking rule. In this paper, we restrict our considera- 
tion to manufacturing blocking and communication blocking which are most 
commonly encountered in practice [21 El [5] . 

Let us consider an open tandem system of N queues (Fig. [2]), and 
assume the buffer at the nth server, n = 1, . . . , N , to be of the capacity 
m n , < m n < oo . 

1 N 

□o-Do— -Do— 

mi mjy 



Figure 2: Tandem queues with finite buffers. 

Manufacturing Blocking. First we suppose that the dynamics of the 
system follows the manufacturing blocking rule. Under this type of blocking, 
if upon completion of a service, the nth server sees the buffer of the (n+l)st 
server full, it cannot be unoccupied and has to be busy until the (n + l)st 
server completes its current service to provide a free space in its buffer. 
Clearly, since the customers leave the system upon their service completion 
at the iVth server, this server cannot be blocked. 

With the additional condition that D k = if n > N , one can describe 
the dynamics of the system by the equations [21 El El [6] 

B k = D k n _ x VD k n -\ (4) 
C k n = B k + r k , (5) 
D k n = C k n y D k n - + ^- 1 . (6) 

Communication Blocking. This rule does not permit a server to initiate 
service of a customer if the buffer of the next server is full. In that case, 
the server remains unavailable until the current service at the next server is 
completed. 

Let us assume that the system depicted in Fig. [2] follows communication 
blocking, and introduce the notation H k to denote the time instant at 
which the nth server becomes ready to check whether there is empty space 
at the buffer of the next server, and to initiate service of customer k if it 
is possible. Now the system dynamics may be represented by the equations 

mum 

H k = D k _ x VD k n -\ (7) 

B k = H k n yD k n ~T n+1 -\ (8) 
D k = B k + r k . (9) 
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2.3 Representation of System Performance 

Suppose that we observe the system until the if th service completion at 
server n, 1 < n < N . As is customary in queueing system simulation, 
we assume that K > N . The following average quantities are normally 
considered as performance criteria for server n in the observation period 

system time 
of one customer 

waiting time 
of one customer 

throughput rate 
of the server: 

utilization 
of the server: 

number of 
customers: 

queue length 
at the server: 

Clearly, the above criteria are suited to the systems with both infinite 
and finite buffers. Furthermore, one can consider the average idle time of 
server n , which presents a criterion inherent only in the systems with finite 
buffers. It is defined for the manufacturing and communication blocking 
rules respectively as [2j [6] 

K 

IM n = *£(D*-C*)/K, 

k=l 
K 

IC n = Y,( B n- H n)/K. 

k=l 

Note finally that these expressions may be also written in terms of departure 
epochs and service times in the same form as 

K 

I n = J2 (D k n - (D k n _! V Dt 1 ) ~ r n fc ) IK. 
k=l 



Sn = T,k=i(D k n -A k n )/K, 
W n = Ek=i(B k -A k n )/K, 
T n = K/D«, 

Un = Ek= l T k /D«, 

Jn = Sfc=l(^ra — A k )/Dn 

Qn = J2k=i(B k — Afy/D^ 
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3 Tandem Queues Simulation Algorithms 



We start with the description of a simple simulation procedure designed for 
the tandem system with infinite buffers, and then extend the procedure to 
algorithms for systems with finite buffers and blocking. It is shown how the 
algorithms can be refined so as to evaluate system performance. In addition, 
time and memory requirements associated with the algorithms are briefly 
discussed. 

3.1 The Basic Simulation Procedure 

We use the procedure proposed in [7j, which was designed for the simulation 
of the tandem queueing system described by equations JUS]). It actually per- 
forms computations of successive state variables B% and with indices 
being varied in a particular order. According to this order, at each iteration 
i , the variables with n + k = i, i = 1,2,..., have to be evaluated. The next 
algorithm shows how to implement this procedure to the simulation of the 
first K customers in a tandem queueing system with infinite buffers and 
N servers, K > N . 

Algorithm 1. 

Set di = 0, i = -1,0,... ,N; 
for i = 1,...,K + N, do 

jo < — max(l,i - N); 

J < — min(i, K); 
for j = jo, jo + 1,..., J, do 



In Algorithm [H the variables b n and d n serve all the iterations to store 
current values of B^ and D\ respectively, for k = 1,...,K. Upon the 
completion of the algorithm, we have for server n the Kth departure time 
saved in d n , n = 0, 1, . . . , N . 

Since one maximization and one addition have to be performed so as to 
get new variables B^ and , one can conclude that the entire algorithm 
requires 0(2(N + 1)K) arithmetic operations without considering index 
manipulations. Moreover, the order in which the variables are evaluated 
within each iteration is essential for reducing memory used for computations. 
It is easy to see that only 0(N + 1) memory locations are actually required 
with this order, provided only the departure epochs D\ are to be calculated. 
To illuminate the memory requirements, let us represent Algorithm [1] in 
another form as 

Algorithm 2. 




Set di = 0, i = -1,0, ... ,JV; 
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for i = 1,...,K + N, do 

jo < — max(l,i - N); 
J < — min(i, K); 
for j =j Q ,j + l,...,J, do 




Finally, we suppose that there is a computer system with either a vector 
processor or SIMD parallel processors available for tandem queueing sys- 
tem simulation. In that case, we can use the following algorithm, which is 
actually a simple modification of Algorithm [TJ 

Algorithm 3. 

Set di = 0, i = -1,0,..., N; 
for i = 1,...,K + N, do 

jo < — max(l,i - N); 
J < — min(i, K); 

in parallel, for j = j , j + 1, . . . , J, do 



Let P denote the length of vector registers of the vector processor or 
the number of parallel processors, depending on whether a vector or parallel 
computer system appears to be available. It is not difficult to see that 
Algorithm [3] requires the condition P > N + 1 to be satisfied. Otherwise, 
if P < N + 1 , one simply has to rearrange computations so as to execute 
each iteration in several parallel steps. In other words, all operations within 
an iteration should be sequentially separated into groups of P operations, 
assigned to the sequential steps. 

It has been shown in [7] that for any integer P > , Algorithm [3] re- 
quires O (2N + 2K + 2\_N/P\ (K — P)) parallel (vector) operation, where 
[x\ denotes the greatest integer less than or equal to x. Moreover, pro- 
vided that P = N + 1 , the algorithm achieves linear speedup in relation 
to Algorithm [1] as the number of customers K — > oo . Finally, it is easy to 
understand that Algorithm [3] entails 0{2{N + 1)) memory locations. 

3.2 Simulation of Queues with Finite Buffers 

Taking equations (jHED as the starting point, we can readily rewrite Algo- 
rithm [3] so as to make it possible to simulate tandem queueing systems with 
manufacturing blocking. Let us first introduce the variables b n and c n 
to represent current values of B„ and C„. Since calculation of in- 
volves taking account of the value of -D^ + ™ n+1 1 , one has to keep in mem- 
ory all values D J ll+1 with j = k — m n+ i — l,k — m n+ i, . . . , k. Therefore, 
we further introduce the variables d? n as memory locations of these values, 
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n = 1, . . . , N, j = 0, 1, ... , m n . The locations d° , d\, . . . , d™ n are intended 
to be occupied using cyclic overwriting so that the value is put into the 
location dn with j = k mod {m n + 1) , where mod indicates the modulo 
operation. 

In order to simplify further formulas, we define the index function 



p(k,n) 



k mod (m n + 1), if 1 < n < N 
0, otherwise 



for all k = 1, 2, Finally, with the variables d°_ 1 , d[j , and d^r +1 reserved 

respectively for x , -Dg , and -D^r +1 , we have the next parallel algorithm. 

Algorithm 4. 

Set c?*^ i , c^q , d^r_|_ i =0; 

set d{ = 0, i = l,...,N, j = 0, 1, . . . , mf, 

for i = 1,...,K + N, do 

jo < — max(l,i - N); 

J i — min(i, K); 

in parallel, for j = j ,j + 1, . . . , J, do 

Oi-j i a i-j-l v "i-j ' 

in parallel, for j = j , jo + 1, • • • { J, do 

Ci-j < h-j + T-_j j 

in parallel, for j = j , Jo + 1, • • • , J, do 

d AV, c _ v j 

Consider now equations ([7][9]) which describe the dynamics of the tan- 
dem system operating under the communication blocking rule. With the 
variables h n , n = 0, 1, . . . , N , used as storage for the values of H^, it is 
easy to arrive at 

Algorithm 5. 

Set d?_Y, (if], <^v_|_i = 0; 

set d\ = 0, i = 1, . . . , JV, j = 0, 1, . . . , m,; 

for i = 1,...,K + N, do 

jo « — max(l,i - iV); 

J ^ — min(i, JC); 

in parallel, for j = j , j + 1, . . . , J, do 
in parallel, for j = j , j + 1, . . . , J, do 

ft. . < h. . y d Pti,i-j + V. 

in parallel, for j = j , j + 1, . . . , J, do 

u i-j 1 ^ T i-j- 

In fact, both algorithms differ from Algorithm [3] in that at every itera- 
tion, they involve three parallel operations each, whereas the latter does two 
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operations. Therefore, we may extend the above estimate of time require- 
ments for Algorithm [3] to Algorithm [4] and Algorithm [5l which then becomes 
O (3N + 3K + 3|_iV/.Pj (K — P)) . The number of memory locations now in- 
volved in computations can be evaluated as 0(3(N + 1) + M + 1) , where 

3.3 Evaluation of Performance Criteria 

In order to present a modification of Algorithm [3] suitable for the evaluation 
of the tandem system performance criteria introduced in the previous sec- 
tion, first define the additional variables x n , y n , and z n , n = 0, 1, . . . , N , 
to represent the memory locations which are to store current values of the 
sums 

i=l i=l i=l 

respectively. Taking into account that in the tandem systems with both 
infinite and finite buffers, we have A\ = D^_ 1 for all n = 0, 1, . . . , N , and 
k = 1, 2, . . . , we may write the following parallel algorithm. 

Algorithm 6. 

Set di = 0, i = -1,0,..., iV; 
set Xi,yi,Zi = 0, i = 0, 1, . . . , N; 
for i = l,...,K + N, do 

jo < — max(l,i - N); 

J < — min(i, K); 



in 


parallel, for 


j 


= jo, jo + 1 


. . . , J, 


do 






X t 


~3 * X i~j 


~ d i-3 


-i; 


in 


parallel, for 


j 


= 3o Jo + 1 


••.,</, 


do 






Vi 


-j < Ui-j 


- (ii-j- 


-1) 


in 


parallel, for 


3 


= 3o,3o + 1 


...,J, 


do 






hi 


—j * di-j- 


-i V d^ 


-j'l 


in 


parallel, for 


3 


= h,3o + 1 


...,J, 


do 






di 


-j < bi-j 






in 


parallel, for 


3 


= 3o,3o + 1 


. . . , J, 


do 



x i—j ^ x i—j ^i—j i 

in parallel, for j = j ,j + 1, . . . , J, do 
in parallel, for j = j ,j + 1, . . . , J, do 

Zi-j < Zi_j + t\_- . 

As it is easy to understand, Algorithm |6] requires O (7N + 7K + 7[N/P\ (K - P)) 
parallel operations, and involves 0(5(A r + l)) memory locations. Upon the 
completion of the algorithm, the performance criteria associated with each 
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queue n, n = 1, . . . , N , can be calculated as 





= X n /K, 


W n 


= Vn/K, 


T n 


= K/d n , 


U n 


— %n 1 dn > 


Jri 


— %n 1 dn j 


Qn 


= Vn/dn- 



One can modify both Algorithm U and Algorithm [5] to provide perfor- 
mance evaluation in tandem queueing systems with finite buffers in an anal- 
ogous way. Specifically, the next two algorithms intended to compute the 
average idle time of each server in the system. The first one based on Al- 
gorithm H] is designed for the system operating under the manufacturing 
blocking rule. 

Algorithm 7. 

Set dP_i, <^jv+i = 0; 

set d|=0, i = l,...,N, j = 0, 1, . . . , m»; 

set Xi = 0, i = 0, 1, . . . , N; 

for i = 1,...,K + N, do 

jo « — max(l,i - N); 

J < — min(i, K); 

in parallel, for j = j , j + 1, . . . , J, do 

a i-j-i v a i-j ' 
in parallel, for j = j , j + 1, • • • , J, do 

in parallel, for j = j ,j + 1, . . . , J, do 

in parallel, for j = jo, jo + 1, • • • , J, do 

fiPVj-s) 4 r . . v jp(i.*-i+ 1 ). 

u i-j 1 L «-J v U i-j+l ' 

in parallel, for j = j ,j + 1, . . . , J, do 

x . . ^ x . . + d pii,i ~ i] 

The variable x n inserted in Algorithm [7] serves for each n, n = 0,l,...,N 
to represent current values of the sums Yli=i(^n ~ C n ) ■ Upon the com- 
pletion of the algorithm, one can calculate x n /K which gives the value 
of IM n . The time and memory costs can be estimated respectively as 
0{5N + 5K + 5[N/P\(K - P)) and 0(4(N + 1) + M + 1) . 

Algorithm 8. 

Set d_^, c?q, <ijy +1 = 0; 

set df=0, i = l, ...,JV, j = 0, 1,. . . ,rm; 

set Xj = 0, i = 0, 1, . . . , N; 

for i = 1,...,K + N, do 

jo « — max(l,i - iV); 

J 4 — min(i, JC); 
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in parallel, for j = j ,j + 1, . . . , J, do 

t , jp(j,i— j— 1) i , jp(j— 1>* — i) 

hi-j 'Vd™ 
in parallel, for j = j , Jo + 1, • • • , J, do 

in parallel, for j = j ,j + 1, . . . , J, do 

in parallel, for j = j , Jo + 1, • • • , J, do 
in parallel, for j = j , jo + 1, • • • , J, do 

u i-j 1 ^ T i-j- 

With the same time and memory requirements as for the previous algo- 
rithm, Algorithm [S] allows one to evaluate the average idle time of each 
servers in tandem queues with communication blocking. It produces the 
sums 'Yld = i{B' l n — H^) stored in x n , n = 1, . . . , N , which can be used in 
calculation of the criteria IC n with the expression x n /K . 



4 Conclusions 

Parallel algorithms which offer a quite simple and efficient way of simulating 
tandem queueing system have been proposed. It has been shown that the 
algorithms involve low time and memory requirements. Specifically, one can 
conclude that the parallel simulation of the first K customers in a system 
with iV queues requires the time of order O (L(N + K + \_N/P\ (K — P))) , 
where P is the number of processors, L is a small constant comparable 
with the number of the performance criteria being evaluated. Note, how- 
ever, that this estimate ignores the time required for computing indices, 
and allocating and moving data, which can have an appreciable effect on 
the performance of parallel algorithms in practice. 
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